Files included
------------------------------------------------------------------------------
Stata do-file named total_110519_final.do and the respective log file total_110519_final.log from executing the code.
All data manipulation and generation of the final dataset is included in this one file. 
Comments are included to describe what is done in each part. All the results presented in the paper are labeled accordingly.

Code has been run on Stata/SE 12.1 in Windows XP.


Data
------------------------------------------------------------------------------
Due to the confidential nature of the data, all datasets reside at Statistics Finland. The Stata code has been run at Statistics Finland by their personnel.
Use rights to the data can be obtained from Statistics Finland (contact: tutkijapalvelut@tilastokeskus.fi, 
Further info: http://www.stat.fi/tup/mikroaineistot/index_en.html).
You can also contact authors (lottavaan@gmail.com) for any assistance regarding data.


Datasets used
------------------------------------------------------------------------------
uinv_linked.dta - patent and inventor data, NBER patents and citations datafile. Selected all patents with country code "FI" with application year between 1988 and 1999. 
ht8804_inv.dta - FLEED employee data for the inventors, Statistics Finland
kontr_otos4.dta - FLEED employee data for a control sample of employees, Statistics Finland. Sample of employees selected from the same set of firms where the inventors are employed.
tp8804.dta - firm financial data, Statistics Finland.
rd8505.dta - firm RD data, Statistics Finland (based on the R&D survey).


Variable descriptions for variables used from the original data (generated variables defined in the do-file)
------------------------------------------------------------------------------
1. FLEED EMPLOYEE DATA (kontr_otos4.dta, ht8804_inv.dta)
------------------------------------------------------------------------------

shtun - personal identifier (encrypted)
vuosi - year
sp - gender, 1=male, 2=female
ika - age
kieli - native language, 1=Finnish, 2=Swedish, 0=other, 9=unknown
kans - nationality, 1=Finnish, 2=other

syrtun - company identifier of the employer (encrypted)
ptoim1 - Labor market status (main type of activity)
	11=employed, 
	12=unemployed, 
	21=0-14 yrs old, 
	22=pupil, student, 
	24=pensioner, 
	25=conscript (army), 
	29=unempl. Retirement, 
	99=other out of labor force

amas1 - Employment status, 1= wage and salary earner, 2=entrepreneur 
apvm1 - Date of start of employment

(syrtun, ptoim1, amas1 and apvm1 measured respective to the situation at the end of the year)

tyokk - Number of months employed during the year
ttyotu - Earned income. Earned income is the sum of earned and entrepreneurial income received by households and income recipients during the year.
tyrtu - Entrepreneurial income. Entrepreneurial income includes income from agriculture and forestry, business activity and business group and copyright fees.
svatva - Income subject to state taxation. Includes wage income, entrepreneurial income and other income subject to state taxation (not capital income). The information is based on data in the tax files of the National Board of Inland Revenue concerning income subject to state taxation.
svatvp - Taxed capital income.
nuts2 - nuts2 regional classification (5 dummies: Eastern, Southern, Western, and Northern Finland, and �land)

ututku - Level and field of education (2-digit code)
	The first digit of the variable UTUTKU identifies the level of education. 
	The second digit identifies the field of education.
	
	Level of education  
	0	Pre-primary education
	1	Primary education
	2	Lower secondary education
	3	Upper secondary level education
	5	Lowest level tertiary education
	6	Lower-degree level tertiary education
	7	Higher-degree level tertiary education
	8	Doctorate or equivalent level tertiary
	9	Level of education unknown

	Field of education
	0	General Education
	1	Teacher Education and Educational Science
	2	Humanities and Arts
	3	Social Sciences and Business
	4	Natural Sciences
	5	Technology
	6	Agriculture and Forestry
	7	Health and Welfare
	8	Services
	9	Not known or unspecified

------------------------------------------------------------------------------
2. FIRM FINANCIAL DATA (tp8804.dta)
------------------------------------------------------------------------------
syrtun - company identifier (encrypted)
vuosi - year
tphenk - number of employees (based on the financial statements)
tplv - annual revenue in euros (based on the financial statements)
tol95 - Industry classification (Standard Industrial Classification TOL 1995)

------------------------------------------------------------------------------
3. FIRM RD-DATA (rd8505.dta)
------------------------------------------------------------------------------
syrtun - company identifier (encrypted)
vuosi - year
ysyht - Total internal R&D expenditure (euros)

------------------------------------------------------------------------------
4. PATENTS (uinv_linked.dta)
------------------------------------------------------------------------------
gyear - grant year of patent
syrtun_pat - company identifier of the patent assignee firm (encrypted)
syrtun_fleed - company identifier of the patent inventor's employer (encrypted)
spatent - patent identifier (encrypted)
creceive1 - number of citations received (from 2002 NBER data)
hakuvuosi - application year of patent
cat - technological category of patent
asscode - assignee type code
     1    = unassigned
     2    = assigned to a U.S. nongovernment organization
     3    = assigned to a non-U.S., nongovernment organization
     4    = assigned to a U.S. individual
     5    = assigned to a non-U.S. individual
     6    = assigned to the U.S. (Federal) Government
     7    = assigned to a non-U.S. government
     8,9  = assigned to a U.S. non-Federal Government agency (do not appear in the dataset)

no_inv - count of the number of inventors listed in the patent

